A Decoupled Fetch-Execute Engine with Static Branch Prediction Support

نویسندگان

  • Arthur A. Bright
  • Jason Fritts
  • Michael Gschwind
چکیده

We describe a method for supporting static branch prediction on a decoupled fetch-execute pipeline. Using instruction buffers to decouple instruction fetch from the execute pipeline is an effective way to minimize instruction cache penalties by allowing instruction fetch and stall miss handling to proceed independent of the execution pipeline. Dynamic branch prediction is typically used with such architectures, but it is not necessary to assume the cost of dynamic branch hardware when static prediction is sufficient. Traditional static branch prediction approaches were designed for lock-step pipelines and do not adapt well to decoupled fetch-execute pipelines, so alternative means of support were required. We describe the requirements for achieving efficient static branch prediction on a decoupled fetch-execute architecture, and presents the design and results for an implementation on an EPIC-style target architecture.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Latency Tolerant Branch Predictors

The access latency of branch predictors is a well known problem of fetch engine design. Prediction overriding techniques are commonly accepted to overcome this problem. However, prediction overriding requires a complex recovery mechanism to discard the wrong speculative work based on overridden predictions. In this paper, we show that stream and trace predictors, which use long basic prediction...

متن کامل

A latency-conscious SMT branch prediction architecture

Executing multiple threads has proved to be an effective solution to partially hide latencies that appear in a processor. When a thread is stalled because a long-latency operation is being processed, like a memory access or a floatingpoint calculation, the processor can switch to another context so that another thread can take advantage of the idle resources. However, fetch stall conditions cau...

متن کامل

On the Performance of Fetch Engines Running DSS Workloads

This paper examines the behavior of current and next generation microprocessors’ fetch engines while running Decision Support Systems (DSS) workloads. We analyze the effect of the latency of instructions being fetched, their quality and the number of instructions that the fetch engine provides per access. Our study reveals that a well dimensioned fetch engine is of great importance for DSS perf...

متن کامل

Tolerating Branch Predictor Latency on SMT

Simultaneous Multithreading (SMT) tolerates latency by executing instructions from multiple threads. If a thread is stalled, resources can be used by other threads. However, fetch stall conditions caused by multi-cycle branch predictors prevent SMT to achieve all its potential performance, since the flow of fetched instructions is halted. This paper proposes and evaluates solutions to deal with...

متن کامل

Optimizations Enabled by a Decoupled Front-End Architecture

ÐIn the pursuit of instruction-level parallelism, significant demands are placed on a processor's instruction delivery mechanism. Delivering the performance necessary to meet future processor execution targets requires that the performance of the instruction delivery mechanism scale with the execution core. Attaining these targets is a challenging task due to I-cache misses, branch mispredictio...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1999